MuMIC – Multimodal Embedding for Multi-Label Image Classification with Tempered Sigmoid

نویسندگان

چکیده

Multi-label image classification is a foundational topic in various domains. Multimodal learning approaches have recently achieved outstanding results representation and single-label classification. For instance, Contrastive Language-Image Pretraining (CLIP) demonstrates impressive image-text abilities robust to natural distribution shifts. This success inspires us leverage multimodal for multi-label tasks, benefit from contrastively learnt pretrained models. We propose the Image Classification (MuMIC) framework, which utilizes hardness-aware tempered sigmoid based Binary Cross Entropy loss function, thus enables optimization on objectives transfer CLIP. MuMIC capable of providing high performance, handling real-world noisy data, supporting zero-shot predictions, producing domain-specific embeddings. In this study, total 120 classes are defined, more than 140K positive annotations collected approximately 60K Booking.com images. The final model deployed Content Intelligence Platform, it outperforms other state-of-the-art models with 85.6% GAP@10 83.8% GAP all classes, as well 90.1% macro mAP score across 32 majority classes. summarize modelling choices extensively tested through ablation studies. To best our knowledge, we first adapt pretraining problems, innovation can be transferred

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-Task Label Embedding for Text Classification

Multi-task learning in text classification leverages implicit correlations among related tasks to extract common features and yield performance gains. However, most previous works treat labels of each task as independent and meaningless onehot vectors, which cause a loss of potential information and makes it difficult for these models to jointly learn three or more tasks. In this paper, we prop...

متن کامل

Matrix Completion for Multi-label Image Classification

Recently, image categorization has been an active research topic due to the urgent need to retrieve and browse digital images via semantic keywords. This paper formulates image categorization as a multi-label classification problem using recent advances in matrix completion. Under this setting, classification of testing data is posed as a problem of completing unknown label entries on a data ma...

متن کامل

Multi-label Image Classification with A Probabilistic Label Enhancement Model

In this paper, we present a novel probabilistic label enhancement model to tackle multi-label image classification problem. Recognizing multiple objects in images is a challenging problem due to label sparsity, appearance variations of the objects and occlusions. We propose to tackle these difficulties from a novel perspective by constructing auxiliary labels in the output space. Our idea is to...

متن کامل

Multi-Label Image Classification with Regional Latent Semantic Dependencies

Deep convolution neural networks (CNN) have demonstrated advanced performance on single-label image classification, and various progress also have been made to apply CNN methods on multi-label image classification, which requires to annotate objects, attributes, scene categories etc. in a single shot. Recent state-of-the-art approaches to multi-label image classification exploit the label depen...

متن کامل

Multi-Label Classification with Label Constraints

We extend the multi-label classification setting with constraints on labels. This leads to two new machine learning tasks: First, the label constraints must be properly integrated into the classification process to improve its performance and second, we can try to automatically derive useful constraints from data. In this paper, we experiment with two constraint-based correction approaches as p...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2023

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v37i13.26850